REMARKS 



Claims 1, 18, 43, 50, 57, 61, 64, and 65 have been amended. Claims 1-65 remain 
pending in the application. Reconsideration is respectfully requested in light of the 
following remarks. 

Section 103(a) Rejections : 

The Examiner rejected claims 1, 2, 4, 5, 13-16, 18, 19, 21-23, 30, 33-44, 49-51 
and 56-65 under 35 U.S.C. § 103(a) as being unpatentable over Huppenthal, et al. (U.S. 
Patent 6,339,819) (hereinafter "Huppenthal") in view of Hinds, et al. (U.S. Patent 
6,542,916) (hereinafter "Hinds"), and further in view of Chen et al. (U.S. Patent 
6,763,365) (hereinafter "Chen"), claims 3, 20, 45 and 52 as being unpatentable over 
Huppenthal, Hinds and Chen, and further in view of Lasher et al. (U.S. Patent 4,863,247) 
(hereinafter "Lasher"), claims 6-12, 24-29, 31, 32, 48, 53, 54 and 55 as being 
unpatentable over Huppenthal, Hinds and Chen, and further in view of Stribaek et al. 
(U.S. Patent 7,181,484) (hereinafter "Stribaek"), claim 17 as being unpatentable over 
Huppenthal, Hinds and Chen, and further in view of Chen et al. (U.S. Patent 6,687,725) 
(hereinafter "Chen2"), and claims 46 and 47 as being unpatentable over Huppenthal, 
Hinds, Chen and Lasher and further in view of Stribaek. Applicants respectfully traverse 
these rejections for at least the following reasons. 

In the Response to Arguments section of the present Office Action, the Examiner 
submits that Applicants' Specification (on page 8, at paragraph [1056], lines 6-8; and on 
page 15, at paragraph [1076], lines 6-8]) discloses "the concept of the multiply- 
accumulate-chaining operation. The operation chaining concept discloses the chaining of 
operations in order to perform successive arithmetic operations in response to a single 
initiated instruction. The arithmetic operations can be multiplication, accumulation, 
carry-save, partial multiplication, and etc." The Examiner is clearly mischaracterizing 
the teachings of Applicants' Specification. 



10/626,420 (6000-3230 1/P9388) 



16 



Meyertons, Hood, Kivlin, Kowert & Goetzel., P.C. 



Paragraph [1056] of Applicants' Specification actually states, in its entirety, the 
following: 

"Multi-word multiplications and additions may be computed using many word- 
sized multiply, add, and shift operations. As shown in FIG. 1, a 1024-bit integer 
X, for example, can be represented with sixteen 64-bit words X=(xl5, . . . , xl, 
xO). That form is commonly referred to as multi-precision representation. The 
more efficiently a multi-word operation can be performed, the better the public- 
key performance will be. Adding capabilities to a general purpose processor to 
speed up multi-word operations is a key to accelerating public-key computations, 
and add-chaining and multiply-chaining are two of the capabilities needed." 

This passage describes that adding add-chaining and multiply-chaining 
capabilities to a general purpose processor (such as by adding the new processor 
instructions of Applicants' claimed invention) will speed up multi-word operations , and 
therefore, can be used to accelerate public-key computations, which typically include 
such multi-word operations. This passage does not describe "multiply-accumulate- 
chaining," or "the chaining of operations in order to perform successive arithmetic 
operations in response to a single initiated instruction." In fact, it is not clear what the 
Examiner means by the phrase "the chaining of operations in order to perform successive 
arithmetic operations in response to a single initiated instruction." Applicants' claims 
describe a relationship between a currently executing single arithmetic instruction of the 
processor instruction set and a previously executed single arithmetic instruction of the 
processor instruction set, not just a collection of isolate arithmetic operations that are 
performed in response to a single initiated instruction. 

Paragraph [1076] of Applicants' Specification actually states, in its entirety, the 
following: 

The umulxck instruction is an efficient way to support public-key computations. 
Back-to-back scheduling of multi-word multiplications and accumulations is often 
difficult using today's instruction sets due to the multiplier latency. The umulxck 
instruction performs the multiply-accumulate-chaining operation which combines 
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add-chaining and multiply-chaining in one operation and avoids the multiplier 
latency. Using the umulxck instruction, and referring again to FIG. 5, the 
calculation of a 64x1024 bit partial product (yO*X) and the accumulation with a 
previous partial product (sl6:s0), can be accomplished in the following 20 
instructions: 



set register k=y0; 

umulxck 0,0; // clear extended-carry register exc first 

rO = umulxck xO, sO; 



rl5 = umulxck xl5, sl5; 

rl6 = umulxck 0, si 6; // catch 64 carry out bits 

rl7 = umulxck 0, 0; // catch last carryout bit 

As described in Applicants' specification, each single umulxck instruction in this 
example performs both multiply and accumulate operations and also implicitly allows for 
accumulating an additional partial result of a previous single umulxck instruction , i.e. 
without requiring additional add operations or specifying an additional operand for the 
additional accumulate operation. Specifically, a umulxck instruction of the form shown 
above (rd = umulxck rsl, rs2 - where rsl and rs2 are source operands and rd is a 
destination operand), performs (rsl*k + rs2 + a partial result of a previous single 
arithmetic instruction, such as a previous umulxck instruction) without explicitly 
specifying the additional operand (the partial result of the previous single arithmetic 
instruction). The result register specified by operand rd receives the lower order bits of 
the result of this calculation, and the extended carry register (which is also not specified 
by any of the operands of the instruction) receives the upper order bits of the result of this 
calculation. 
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In the example referenced by the Examiner, the third single arithmetic instruction 
(rO = umulxck xO, sO): 

performs the calculations (xO*k + sO + exc), where exc = 0 (because the second 

instruction above cleared exc) 
stores the lower order bits of the result of this calculation in rO; and 
stores the upper order bits of the result of this calculation in exc. 

In this example, the fourth single arithmetic instruction (not shown, but implied to 
be (rl = umulxck xl, si): 

performs the calculations (xl*k + si + exc), where exc now contains the upper 

order bits of the result of the calculation performed for the third 

instruction, as noted above ; 
stores the lower order bits of the result of this calculation in rO; and 
stores the upper order bits of the result of this calculation in exc. 

Similarly, each successive single umulxck instruction in this example performs a 
similar calculation on the explicitly specified source operands and also implicitly adds the 
contents of exc that were stored in exc by the previous umulxck instruction as the high 
order bits of the calculation made for that previous umulxck instruction . 

As described in detail above, the Examiner's characterization of the teachings of 
Applicants' Specification is clearly incorrect. The cited portions of Applicants' 
Specification describe a specific processor instruction (umulxck). When a single instance 
of this specific processor instruction is executed by the processor, the processor performs 
specific multiply and accumulate calculations , including the implicit addition of a partial 
result of a previously executed single instance of the umulxck instruction . In other 
words, as noted above, the execution of one of the processor instructions recited in 
Applicants' claims facilitates a feedback relationship between two successive single 
arithmetic instructions , not merely the performance of a collection of arithmetic 
operations in response to a single initiated instruction, or a general concept of operation 
chaining in performing a single arithmetic instruction. Because execution of these 
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instructions facilitates this feedback relationship, the chaining of successive single 
instances of this umulxck instruction can be used to perform (and accelerate) the multi- 
word multiplies and accumulates that are common in cryptography applications. 

In the Response to Arguments section of the present Office Action, the Examiner 
also submits, "The Huppenthal prior art discloses architecture for chaining a very large 
number of arithmetic operations (such as multiplication, accumulation, and etc) to form a 
single arithmetic instruction. The single arithmetic instruction is initiated and the entire 
sequence of chained instructions is performed with operand transfer controlled by the 
architecture via the usage of a chain port mechanism. The chain port mechanism supplies 
operands to each successive arithmetic operation in the sequence. The chain mechanism 
supplies operands without any support from the processor. The Hinds prior art discloses 
multiply-accumulate arithmetic operation sequences and its combination with Huppenthal 
discloses these arithmetic operational sequences completed as a single arithmetic 
instructions." Applicants assert that the Examiner is clearly mischaracterizing the 
teachings of both Huppenthal and Hinds in suggesting that the combination teaches 
Applicants' claimed invention. 

Huppenthal is directed to a multiprocessor computer architecture incorporating a 
number of memory algorithmic processors ("MAP") in the memory subsystem or closely 
coupled to other processing elements (e.g., one or more fixed instruction set 
microprocessor-based processors) to enhance overall system processing speed. The 
memory algorithmic processor architecture is an assembly that contains field 
programmable gate arrays (FPGAs) functioning as the memory algorithmic processors. 
In other words, each memory algorithmic processor includes one or more FPGAs that are 
configurable to perform various functions that are not performed by one of the 
instructions of the fixed instruction microprocessor-based processors in the system . As 
described in the Summary section of Huppenthal, a MAP element can function 
autonomously from its host system (i.e. without processor intervention) once its operands 
have been loaded . MAP elements (which are not processor instructions , nor do they 
represent circuitry to execute arithmetic instructions of a microprocessor instruction set ) 
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can be chained together to forward operands from one to another when multiple MAP 
elements are working together to perform a very large function. As described in detail in 
Huppenthal, the MAP architecture, and FPGAs thereof, perform functions in response to 
commands written to them using a write instruction that specifies a command and 
operands in the data. This is clearly not analogous to the limitations of Applicants' 
claims, which require execution of single arithmetic instructions of a processor 
instruction set in succession. For example, Table 2 of Huppenthal (and the 
accompanying description) describe commands that are communicated to the MAP 
architecture by issuing a write instruction from processor 12. These commands are used 
to configure various elements of the MAP architecture (e.g., RMB, RUC, RECON, 
LDROM), to pass operands to the MAP architecture (e.g., WRTOP, LASTOP), and to 
initiate the performance of a function implemented in the MAP circuitry (e.g., START). 

While performing a function using the MAP circuitry of Huppenthal may 
involve performing a series of calculations or other functions using one or more 
FPGAs, this teaches absolutely nothing about Applicants' claimed invention. 

Applicants' claims are not directed to the general concept of performing a collection of 
arithmetic operations to carry out a single arithmetic instruction, as the Examiner implies, 
or to the chaining of operations performed within a MAP architecture or FPGA in 
response to a command issued using one or more write instructions of a processor, but to 
specific arithmetic instructions in a processor's instruction set , individual instances of 
which implicitly add partial results of a previously executed single arithmetic instruction 
when executing a current single arithmetic instruction without explicitly specifying the 
partial result as an operand of the current single arithmetic instruction. 

To establish a prima facie obviousness of a claimed invention, all claim 
limitations must be taught or suggested by the prior art. In re Royka, 490 F.2d 981, 180 
U.S.P.Q. 580 (C.C.P.A. 1974), MPEP 2143.03. The cited art does not teach or suggest 
all limitations of the currently pending claims, some distinctive limitations of which are 
set forth in more detail below. 
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Regarding claim 1, the cited art fails to teach or suggest in response to executing a 
single arithmetic instruction of a processor instruction set, multiplying a first number by 
a second number; and adding implicitly a partial result from a previously executed single 
arithmetic instruction of the processor instruction set to generate a result that represents 
the first number multiplied by the second number summed with the partial result, wherein 
the partial result comprises a high order portion of a result of the previously executed 
single arithmetic instruction, and wherein the single arithmetic instruction does not 
include an explicit source operand for specifying the partial result; storing at least a 
portion of the generated result; and using the stored at least a portion of the generated 
result in a subsequent computation in the cryptography application. 

The Examiner submits, "Huppcnthal discloses a method implemented in a device, 
the method comprising: executing a single arithmetic instruction of a processor 
instruction set , and storing at least a portion of the generated result; and using the stored 
at least a portion of the generated result in a subsequent computation in the cryptography 
application" (emphasis Examiner's). Specifically, the Examiner cites Huppenthal (in 
column 3, lines 1-7) as disclosing "number of MAP elements chained together to 
accomplish a single function or operation (implies a single arithmetic instruction)." The 
Examiner's remarks regarding the implication of a single arithmetic instruction of a 
processor instruction set are completely unsupported by the reference itself. In fact, 
as discussed above, the memory algorithmic processors taught by Huppenthal are 
explicitly disclosed as being separate from the fixed instruction set microprocessors in the 
system (such as processors \2\_- 12 n ). Instead, they act as co-processors implemented in 
FPGAs or another type of user array to perform algorithmic functions in response to 
commands issued to them using a write instruction of the fixed instruction set processor, 
which is clearly not an arithmetic instruction of the fixed instruction set processor. 

The Examiner cites Huppenthal (in column 3, lines 18-25) as disclosing, "MAP 
elements can receive operands via chained port)" and (in column 18, line 66 - column 19, 
line 3) as disclosing, "output data from one MAP element to be sent directly to the user 
array of the next MAP element with no processor intervention via a chain port." 
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Applicants again assert that the chaining of MAP elements to perform a large function 
coded in FPGAs and initiated by a write instruction teaches nothing about the limitations 
of Applicants' claim. For example, it does not teach, in response to executing a single 
arithmetic instruction of a processor instruction set, multiplying a first number by a 
second number; and adding implicitly a partial result from a previously executed single 
arithmetic instruction of the processor instruction set. The chaining of MAP elements 
and passing of operands between MAP elements over a chain port when performing a 
single large operation implemented in one or more FPGAs, as described by Huppenthal, 
teach nothing about feedback between arithmetic instructions of a processor's instruction 
set . 

The Examiner admits that Huppenthal does not specifically disclose a multiplying 
and summing sequence and relies on Hinds to teach such a sequence. Applicants again 
note that claim 1 does not merely require "a multiplying and summing sequence" but also 
requires feedback between two different arithmetic instructions of the processor 
instruction set, i.e. adding implicitly a partial result from a previously executed single 
arithmetic instruction . The Examiner submits that Hinds discloses multiplying a first 
number by a second number; and adding implicitly a partial result from an executed 
arithmetic instruction to generate a result that represents the first number multiplied by 
the second number summed with the partial result, wherein the partial result comprises a 
high order portion of a result of the previously executed single arithmetic instruction. 
Specifically, the Examiner cites Hinds (in column 3, lines 5-17) as disclosing, "applying 
a multiply-accumulate operation to operands; multiplying operands and adding 
multiplication results" and (in column 8, lines 6-25) as disclosing, "chained multiply- 
accumulate operation; carry save adders and usage of partial multiplier (partial results: 
high order bits); output of results of partial multiplier is normalized and passed to carry 
save adders and final product adder)." Hinds is directed to an apparatus and method for 
implementing a floating point MAC instruction that takes three operands as inputs. The 
cited portions of Hinds describe the operation of an individual floating point MAC 
instruction , which includes both a multiplication and an addition operation. As described 
in Hinds, all three operands are explicitly specified for the MAC instruction; there is no 
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additional operand that is implicitly added during execution of Hinds' MAC instruction, 
i.e. adding implicitly a partial result from a previously executed single arithmetic 
instruction... wherein the single arithmetic instruction does not include an explicit source 
operand for specifying the partial result. The Examiner's citation in column 8 refers to a 
prior art "chained" multiply-accumulate FPU, illustrated in FIG. 3B, which is operable to 
implement a single floating point multiply-accumulate operation (A+(B*C)) or floating- 
point multiply-subtract operation (-A+(B*C)) in response to issuance of a single floating- 
point instruction. This "chained" multiply-accumulate FPU passes results of a partial 
multiplier to a carry save adder and final product adder as part of executing a single 
floating point MAC instruction . This partial result is not implicitly added as part of 
executing a subsequent arithmetic instruction , as in Applicants' claimed invention, nor is 
it stored and used in a subsequent computation in the cryptography application . 

The Examiner submits that it would have been obvious to one of ordinary skill in 
the art to modify Huppenthal for supporting cryptographic application as taught by 
Hinds, stating, "One of ordinary skill in the art would have been motivated to employ the 
teachings of Hinds to specifically increase the speed of multiply-accumulate operations 
and achieve faster computations. (Hinds col. 1, lines 27-29)." The cited passage of 
Hinds describes "interest in developing FPUs arranged specifically to perform multiply- 
accumulate operations with increased speed." However, contrary to the Examiner's 
suggestion, Hinds does not mention anything about supporting cryptography applications . 
In addition, it is not clear how the Examiner means to incorporate the teachings of Hinds 
into the multiprocessor system of Huppenthal, or how that would improve performance 
of multiply-accumulate instructions, which are not described anywhere in Huppenthal. 
Does he mean to imply that the floating point MAC instruction of Hinds (and associated 
FPU logic) should be added to the instruction set (and processor circuitry) of the fixed- 
instruction processors (e.g., processor 12) in the system of Huppenthal? In this case, the 
Examiner's citation regarding the MAP architecture would not be applicable to this new 
instruction, since it would not be used to implement it. Or does the Examiner mean to 
imply that the FPU of Hinds could be implemented using the MAP architecture (e.g., in 
one or more FPGAs) such that a floating point MAC operation is invoked by commands 
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issued to the MAP architecture using a write instruction ? Again, this would not teach or 
suggest using the MAP architecture to implement an arithmetic instruction of the 
processor's instruction set , as the Examiner has previously suggested. 

In addition, since the floating point MAC instruction of Hinds does not implicitly 
add a partial result from the previous execution of another floating point MAC (or a 
partial result from the previous execution of any other arithmetic instruction in the 
processor's instruction set), as suggested by the Examiner and as required by Applicant' 
claim, no combination of Huppenthal and Hinds would result in Applicants' claimed 
invention . At most, it could result in a system in which the floating point MAC operation 
of Hinds (which does not include such feedback) is implemented algorithmically in one 
or more FPGAs of the MAP architecture of Huppenthal, e.g., if such an instruction is not 
included in the instruction set of the fixed-instruction processors of Huppenthal. 

The Examiner admits that Huppenthal-Hinds does not specifically disclose 
supporting a cryptography application and relies on Chen to disclose supporting a public- 
key cryptography application (citing Chen, column 6, lines 23-25, and "arithmetic 
operations to support acceleration of cryptographic functions"). The Examiner states, "It 
would have been obvious to one of ordinary skill in the art to modify Huppenthal-Hinds 
to support a cryptographic application as taught by Chen. One of ordinary skill in the art 
would have been motivated to employ the teachings of Chen to greatly improve the 
performance of cryptographic circuits. (Chen col. 5, lines 40-42)." The Examiner seems 
to imply that the reason to modify Huppenthal-Hinds to support cryptographic 
applications is to improve the performance of something that apparently does not exist in 
Huppenthal-Hinds (cryptographic circuits) for something that the Examiner admits that 
Huppenthal-Hinds does not support (i.e. cryptographic applications). Therefore, the 
Examiner has not provided a valid reason to combine the references. In addition, as 
described in detail above, the cited art does not teach all of the limitations of Applicants' 
claims, whether taken alone or in combination. 
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For at least the reasons stated above, Applicants assert that the Examiner has 
failed to establish a prima facie rejection of claim 1. 

Independent claims 43, 57, and 64 include limitations similar to those recited in 
claim 1 and discussed above, and were rejected for similar reasons. Therefore, the 
arguments presented above apply with equal force to these claims, as well. 

Independent claim 18 includes limitations similar to those recited in claim 1 and 
discussed above, and was rejected for reasons similar to those discussed above regarding 
claim 1. Therefore, Applicants traverse this rejection for at least the reasons presented 
above regarding limitations in this claim that are similar to those in claim 1 . 

In addition, claim 18 recites adding a third number to generate a result that 
represents the first number multiplied by the second number summed with the partial 
result and the third number. The Examiner submits that Hinds discloses these limitations 
using the same citations and reasoning as those included in remarks directed to claim 1 . 
However, as discussed in detail above, Hinds teaches a floating point MAC instruction of 
the form (A+(B*C)), for which operands A, B, and C are explicitly specified. Hinds does 
not disclose a MAC instruction that adds partial results of a previously executed 
instruction , when executing this instruction, to generate a result that represents the first 
number multiplied by the second number summed with the partial result and the third 
number, as required by claim 18. 

For at least the reasons above, the rejection of claim 18 is unsupported by the 
cited art and removal of the rejection thereof is respectfully requested. 

Claims 50, 61, and 65 include limitations similar to those recited in claims 1 and 
18 and discussed above, and were rejected for the same reasons as claims 1 and 18. 
Therefore, the arguments presented above apply with equal force to these claims, as well. 
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Applicants assert that numerous ones of the dependent claims recite further 
distinctions over the cited art. Applicants traverse the rejection of these claims for at 
least the reasons given above in regard to the claims from which they depend. However, 
since the rejections have been shown to be unsupported for the independent claims, a 
discussion of the dependent claims is not necessary at this time. Applicants reserve the 
right to present additional arguments. 
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CONCLUSION 



Applicants submit the application is in condition for allowance, and an early 
notice to that effect is requested. 



If any fees are due, the Commissioner is authorized to charge said fees to 
Meyertons, Hood, Kivlin, Kowert, & Goetzel, P.C. Deposit Account No. 501505/6000- 
32301/RCK. 



Respectfully submitted, 

/Robert C. Kowert/ 

Robert C. Kowert, Reg. #39,255 
Attorney for Applicant(s) 



Meyertons, Hood, Kivlin, Kowert, & Goetzel, P.C. 

P.O. Box 398 

Austin, TX 78767-0398 

Phone: (512) 853-8850 

Date: May 4, 2010 
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